Infinity Notebook Loop in Fabric using Data Activator

Comments 0

Share to social media

Data Activator allow us to make an interesting workaround for infinity notebook executions.

Infinity execution loop is interesting for constant data ingestion, but Fabric doesn’t allow an infinity loop inside a notebook, it ends up timing out and failing.

Scheduling the notebook with very tight schedules is one possibility, but it’s not much trustworthy. You would need to know the precise run duration of the notebook and it’s not always the same. You end up with big intervals between the notebooks or with overlapping schedules. None of these options is good.

There is an interesting alternative option we can use. Here comes the workaround.

The Architecture of the workaround

The solution is to create a sign if the notebook is running or not. We can do this using a control table. “Control table” is a regular table in a lakehouse, but we will use with the purpose to control the execution flow of the notebook.

This table needs two fields: The date and time of the record insertion and an integer value which we will set to 0 or 1.

The notebook we want to run needs to make an insert at the beginning of the execution and at the end. At the beginning we insert a record with the value 1, to points the notebook is executing. At the end, we insert a record with value 0.

We can use Data Activator to monitor this table. Every time a record with value 0 is inserted, we execute the notebook again. In this way, Data Activator becomes the responsible for the infinity loop.

Ensuring the final Insert in the Table

It’s essential the final insert always executes at the end of the notebook, even if an error happens.

If this final insert is not executed, the loop breaks. In this way, we need to use a try/finally structure to ensure the final insert execution.

The main execution code could be long. The best way to organize this execution is to break down the codes in functions. The main code structure ends up being simple: only the try/finally exemplified above and calls to the functions which will make the actual work.

This makes the core code quite simple.

Data Activator delay and the Notebook code

Data Activator has a 5 minutes delay to execute a trigger. We need always to consider this delay.

You can extend to the maximum the intervals between the breaks by increasing the execution time of the notebook. I mean that the notebook itself can and should contain a loop of the execution.

The notebook loop can’t be infinity, but you can extend to the maximum possible without making the execution instable. Data Activator will ensure the notebook will be executed again after it finishes.

For example, I implemented scenarios where the notebook runs for 1 hour and 50 minutes. When it stops, it depends on data activator to be triggered again. In this way, every 1 hour and 50 minutes a 5 minutes interval can happen.

Configuring Data Activator

Data Activator requires one of the following to be configured:

  • A real time ingestion EventStream
  • A report visual
  • A Kusto Query in a real time dashboard

Considering the scenario we have, with a lakehouse table, I consider the best option the Kusto Query in a real time dashboard.

I wrote before about how to use a Kusto Database and shortcuts to allow lakehouses to use real time dashboards and data activator.

The main query is very simple, reading the data from the control table. The real time dashboard requires the query to use a time range filter to allow us to set the alert.

A screenshot of a computer

Description automatically generated

On the alert, we configure it to execute each time the value is set to 0.

A screenshot of a computer

Description automatically generated

Analyzing the execution time

An additional feature we get from this method is the possibility to analyze the execution times.

  • The time difference from a record with value 1 to the next record with value 0 tells us the execution duration of the notebook.
  • The difference from a record with value 0 to the next record with value 1 tells us the interval between 1 execution to another.

Creating queries with these calculations we can build a dashboard to analyze the execution process.

A screenshot of a graph

Description automatically generated

Summary

This is a very clever and reliable system for a continuous infinity execution. The only problem is the data activator interval, which may not be enough for some scenarios.

Load comments

About the author

Dennes Torres

See Profile

Dennes Torres is a Data Platform MVP and Software Architect living in Malta who loves SQL Server and software development and has more than 20 years of experience. Dennes can improve Data Platform Architectures and transform data in knowledge. He moved to Malta after more than 10 years leading devSQL PASS Chapter in Rio de Janeiro and now is a member of the leadership team of MMDPUG PASS Chapter in Malta organizing meetings, events, and webcasts about SQL Server. He is an MCT, MCSE in Data Platforms and BI, with more titles in software development. You can get in touch on his blog https://dennestorres.com or at his work https://dtowersoftware.com